Search CORE

77 research outputs found

Terminology extraction: an analysis of linguistic and statistical approaches

Author: Pazienza Mt
Pennacchiotti M
Zanzotto Fm
Publication venue: Springer
Publication date: 01/01/2005
Field of study

Are linguistic properties and behaviors important to recognize terms? Are statistical measures effective to extract terms? Is it possible to capture a sort of termhood with computation linguistic techniques? Or maybe, terms are too much sensitive to exogenous and pragmatic factors that cannot be confined in computational linguistic? All these questions are still open. This study tries to contribute in the search of an answer, with the belief that it can be found only through a careful experimental analysis of real case studies and a study of their correlation with theoretical insights

ART

Generic ontology learners on application domains

Author: Fallucchi F
Pazienza MT
Zanzotto FM
Publication venue
Publication date: 01/01/2010
Field of study

ART

Flames recognition for opinion mining

Author: Lungu I
Pazienza MT
Tudorache A
Publication venue
Publication date
Field of study

The emerging world-wide e-society creates new ways of interaction between people with different cultures and backgrounds. Communication systems as forums, blogs, and comments are easily accessible to end users. In this context, user generated content management revealed to be a difficult but necessary task. Studying and interpreting user generated data/text available on the Internet is a complex and time consuming task for any human analyst. This study proposes an interdisciplinary approach to modelling the flaming phenomena (hot, aggressive discussions) in online Italian forums. The model is based on the analysis of psycho/cognitive/linguistic interaction modalities among web communities' participants, state-of-the art machine learning techniques and natural language processing technology. Virtual communities' administrators, moderators and users could benefit directly from this research. A further positive outcome of this research is the opportunity to better understand and model the dynamics of web forums as the base for developing opinion mining applications focused on commercial applications

ART

Bridging the demand and the offer in data science

Author: Buitelaar P
Bär N
Frantzi KT
Jain AK
Johnson SC
Kersten PR
Kessler R
Lin J
Maynard D
Menczer F
Pazienza MT
Shrivastava V
Vadeyar DA
Vetere G
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

During the last several years, we have observed an exponential increase in the demand for Data Scientists in the job market. As a result, a number of trainings, courses, books, and university educational programs (both at undergraduate, graduate and postgraduate levels) have been labeled as “Big data” or “Data Science”; the fil‐rouge of each of them is the aim at forming people with the right competencies and skills to satisfy the business sector needs. In this paper, we report on some of the exercises done in analyzing current Data Science education offer and matching with the needs of the job markets to propose a scalable matching service, ie, COmpetencies ClassificatiOn (E‐CO‐2), based on Data Science techniques. The E‐CO‐2 service can help to extract relevant information from Data Science–related documents (course descriptions, job Ads, blogs, or papers), which enable the comparison of the demand and offer in the field of Data Science Education and HR management, ultimately helping to establish the profession of Data Scientist.publishedVersio

Crossref

NORA - Norwegian Open Research Archives

UiS Brage

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Creating a medical dictionary using word alignment: The influence of sources and resources

Author: FJ Och
Hans Åhlfeldt
Håkan Petersson
ID Melamed
J Foo
L Ahrenberg
LR Dice
M Merkel
M Merkel
Magnus Merkel
Mikael Nyström
MT Pazienza
Nordic Medico-Statistical Committee
P Tapanainen
PF Brown
Socialstyrelsen
Socialstyrelsen
Socialstyrelsen
Socialstyrelsen
WA Gale
World Health Organization
World Health Organization
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.</p

Publikationer från Linköpings universitet

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

A survey of automatic term extraction for Brazilian Portuguese

Author: A Barrón-Cedeño
A Di Felippo
A Houaiss
AC Gianoti
AC Gianoti
AE Sanchez
Ahmad K
Ariani Di Felippo
BM Nogueira
C Kit
C Manning
C Zavaglia
C Zavaglia
CD Manning
D Jurafsky
D Maynard
DF Honorato
Dice LR
E Bick
F Muniz
FAM Muniz
G Salton
GMB Almeida
GMB Almeida
GMB Almeida
GMdB Almeida
H Schmid
I Korkontzelos
I Witten
J Carletta
J Carroll
J Vivaldi
JC Sager
JFG Oliveira
JS Coleti
JS Coleti
Junior
JWC Souza
K Kageura
KT Frantzi
KW Church
L Ahrenberg
L Liu
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
L Lopes
LA Barros
LC Ribeiro Junior
LHM Oliveira
MAI Gonzalez
Merley da Silva Conrado
MF Moura
MF Teline
MF Teline
MS Conrado
MS Conrado
MS Conrado
MS Conrado
MS Conrado
MT Cabré
MT Pazienza
MV Soares
NFF Ebecken
P Cardoso
R Estopá
R Nazar
RJ Coulthard
RP Batista
S Banerjee
S David
S Tagnin
SM Aluísio
SM Aluísio
SN Kim
Solange Oliveira Rezende
T Liu
Thiago Alexandre Salgueiro Pardo
VD Feltrim
VL Lima
Voutilainen A
Y Park
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

CODHIR - AN INFORMATION-RETRIEVAL SYSTEM BASED ON SEMANTIC DOCUMENT REPRESENTATION

Author: MAREGA R
PAZIENZA MT
Publication venue: BOWKER-SAUR LTD
Publication date: 01/01/1994
Field of study

An information retrieval (IR) system, implemented as a part of a content-driven hypertextual information retrieval (CoDHIR) project, is described. This work focuses on the use of semantic information that can be automatically acquired by applying natural language processing (NLP) techniques to texts. The information is represented using conceptual graphs. The problem of synonyms and homonyms is addressed in our system by using a model based on the interpretation of conceptual graphs extracted from texts. The detection of contextual roles of words allows an improvement in retrieval precision over traditional IR technologies. Ranking of documents, based on document relevance, is obtained by extending the vector space model into an oblique space and taking into account the relevance among different word couples

ART

AI*IA 2007: artificial intelligence and human-oriented computing: 10th Congress of the Italian association for artificial intelligence, Rome, Italy, September 10-13, 2007, Proceedings

Author: Basili R
Pazienza MT
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

ART

CERN Document Server

Semi-automatic ontology development: processes and resources

Author: Pazienza MT
Stellato A
Publication venue: 'IGI Global'
Publication date: 01/01/2012
Field of study

The exploitation of theoretical results in knowledge representation, language standardization by W3C and data publication initiatives such as Linked Open Data have given a level of concreteness to the field of ontology research. In light of these recent outcomes, ontology development has also found its way to the forefront, benefiting from years of R&D on development tools. Semi-Automatic Ontology Development: Processes and Resources includes state-of-the-art research results aimed at the automation of ontology development processes and the reuse of external resources becoming a reality, thus being of interest for a wide and diversified community of users. This book provides a thorough overview on the current efforts on this subject and suggests common directions for interested researchers and practitioner

ART

An environment for semi-automatic annotation of ontological knowledge with linguistic content.

Author: Pazienza Mt
Stellato A
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

Both the multilingual aspects which characterize the (Semantic) Web and the demand for more easy-to-share forms of knowledge representation, being equally accessible by humans and machines, push the need for a more "linguistically aware" approach to ontology development. Ontologies should thus express knowledge by associating formal content with explicative linguistic expressions, possibly in different languages. By adopting such an approach, the intended meaning of concepts and roles becomes more clearly expressed for humans, thus facilitating (among others) reuse of existing knowledge, while automatic content mediation between autonomous information sources gets far more chances than otherwise. In past work we introduced OntoLing [7], a Protégé plug-in offering a modular and scalable framework for performing manual annotation of ontological data with information from different, heterogeneous linguistic resources. We present now an improved version of OntoLing, which supports the user with automatic suggestions for enriching ontologies with linguistic content. Different specific linguistic enrichment problems are discussed and we show how they have been tackled considering both algorithmic aspects and profiling of user interaction inside the OntoLing framework. © Springer-Verlag Berlin Heidelberg 2006

ART